Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Composite document analysis by means of typographic characteristics

Identifieur interne : 000317 ( France/Analysis ); précédent : 000316; suivant : 000318

Composite document analysis by means of typographic characteristics

Auteurs : Laurence Duffy ; Frank Lebourgeois ; Hubert Emptoz [France]

Source :

RBID : ISTEX:332F277976CC0117A5E8758C2755BA5958D3D54F

Abstract

Abstract: We have just presented a new method, of regrouping letters and words in homogeneous font families which doesn't necessitate to explicitly recognise the font. This analysis, achieved with the application of one pattern redundancy technique, allows us to extract a part of the logical information which is carried by words typographic features. After having differentiated, grouped together and compared the typographic families, we'll know: - the cardinality of each family, - its grease, slope and size compared to the others families. The study of the typographic families organisation, and of their relative characteristics, will allows us to classify families according to their logical significance, and so to voice, when it will be possible, hypothesis concerning the logical signification of the families. A comparison between the constructed families and the learned grammar, will come to validate or correct the hypothesis, and to label families for which no hypothesis has been voiced. The significance of the method, we have developed, is that each process only depend on the image ; it isn't depend on the document type or on fonts data basis. So this method can be applied to every document type, specially complex and typographically rich documents. An other significance is that our text markers will be use for describing our document in HTML language

Url:
DOI: 10.1007/3-540-63791-5_14


Affiliations:


Links toward previous steps (curation, corpus...)


Links to Exploration step

ISTEX:332F277976CC0117A5E8758C2755BA5958D3D54F

Le document en format XML

<record>
<TEI wicri:istexFullTextTei="biblStruct">
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Composite document analysis by means of typographic characteristics</title>
<author>
<name sortKey="Duffy, Laurence" sort="Duffy, Laurence" uniqKey="Duffy L" first="Laurence" last="Duffy">Laurence Duffy</name>
</author>
<author>
<name sortKey="Lebourgeois, Frank" sort="Lebourgeois, Frank" uniqKey="Lebourgeois F" first="Frank" last="Lebourgeois">Frank Lebourgeois</name>
</author>
<author>
<name sortKey="Emptoz, Hubert" sort="Emptoz, Hubert" uniqKey="Emptoz H" first="Hubert" last="Emptoz">Hubert Emptoz</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:332F277976CC0117A5E8758C2755BA5958D3D54F</idno>
<date when="1997" year="1997">1997</date>
<idno type="doi">10.1007/3-540-63791-5_14</idno>
<idno type="url">https://api.istex.fr/document/332F277976CC0117A5E8758C2755BA5958D3D54F/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000436</idno>
<idno type="wicri:Area/Istex/Curation">000429</idno>
<idno type="wicri:Area/Istex/Checkpoint">001986</idno>
<idno type="wicri:doubleKey">0302-9743:1997:Duffy L:composite:document:analysis</idno>
<idno type="wicri:Area/Main/Merge">002665</idno>
<idno type="wicri:Area/Main/Curation">002534</idno>
<idno type="wicri:Area/Main/Exploration">002534</idno>
<idno type="wicri:Area/France/Extraction">000317</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title level="a" type="main" xml:lang="en">Composite document analysis by means of typographic characteristics</title>
<author>
<name sortKey="Duffy, Laurence" sort="Duffy, Laurence" uniqKey="Duffy L" first="Laurence" last="Duffy">Laurence Duffy</name>
</author>
<author>
<name sortKey="Lebourgeois, Frank" sort="Lebourgeois, Frank" uniqKey="Lebourgeois F" first="Frank" last="Lebourgeois">Frank Lebourgeois</name>
</author>
<author>
<name sortKey="Emptoz, Hubert" sort="Emptoz, Hubert" uniqKey="Emptoz H" first="Hubert" last="Emptoz">Hubert Emptoz</name>
<affiliation wicri:level="1">
<country wicri:rule="url">France</country>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series>
<title level="s">Lecture Notes in Computer Science</title>
<imprint>
<date>1997</date>
</imprint>
<idno type="ISSN">0302-9743</idno>
<idno type="eISSN">1611-3349</idno>
<idno type="ISSN">0302-9743</idno>
</series>
<idno type="istex">332F277976CC0117A5E8758C2755BA5958D3D54F</idno>
<idno type="DOI">10.1007/3-540-63791-5_14</idno>
<idno type="ChapterID">14</idno>
<idno type="ChapterID">Chap14</idno>
</biblStruct>
</sourceDesc>
<seriesStmt>
<idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass></textClass>
<langUsage>
<language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Abstract: We have just presented a new method, of regrouping letters and words in homogeneous font families which doesn't necessitate to explicitly recognise the font. This analysis, achieved with the application of one pattern redundancy technique, allows us to extract a part of the logical information which is carried by words typographic features. After having differentiated, grouped together and compared the typographic families, we'll know: - the cardinality of each family, - its grease, slope and size compared to the others families. The study of the typographic families organisation, and of their relative characteristics, will allows us to classify families according to their logical significance, and so to voice, when it will be possible, hypothesis concerning the logical signification of the families. A comparison between the constructed families and the learned grammar, will come to validate or correct the hypothesis, and to label families for which no hypothesis has been voiced. The significance of the method, we have developed, is that each process only depend on the image ; it isn't depend on the document type or on fonts data basis. So this method can be applied to every document type, specially complex and typographically rich documents. An other significance is that our text markers will be use for describing our document in HTML language</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>France</li>
</country>
</list>
<tree>
<noCountry>
<name sortKey="Duffy, Laurence" sort="Duffy, Laurence" uniqKey="Duffy L" first="Laurence" last="Duffy">Laurence Duffy</name>
<name sortKey="Lebourgeois, Frank" sort="Lebourgeois, Frank" uniqKey="Lebourgeois F" first="Frank" last="Lebourgeois">Frank Lebourgeois</name>
</noCountry>
<country name="France">
<noRegion>
<name sortKey="Emptoz, Hubert" sort="Emptoz, Hubert" uniqKey="Emptoz H" first="Hubert" last="Emptoz">Hubert Emptoz</name>
</noRegion>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/France/Analysis
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000317 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/France/Analysis/biblio.hfd -nk 000317 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    France
   |étape=   Analysis
   |type=    RBID
   |clé=     ISTEX:332F277976CC0117A5E8758C2755BA5958D3D54F
   |texte=   Composite document analysis by means of typographic characteristics
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024